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Abstract 



Motivated by applications in Markov chain Monte Carlo, we discuss 
what it means for one Markov chain to be an approximation to an- 
^ , other. Specifically included in that discussion are situations in which a 

' Markov chain with continuous state space is approximated by one with 

, finite state space. A simple sufficient condition for close approximation 

CO ' is derived, which indicates the existence of three distinct approximation 

regimes. Counterexamples are presented to show that these regimes 
are real and not artifacts of the proof technique. An application to 
the "ball walk" of Lovasz and Simonovits is provided as an illustrative 
example. 
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1 Discussion 

Monte Carlo algorithms compute approximate solutions to hard problems by 
extracting information from random samples. Markov chain Monte Carlo 
(MCMC) algorithms add an additional ingredient, namely Markov chain 
simulation, to this recipe. The idea is to devise a Markov chain {Xt : t G N) 
whose stationary distribution is the one from which we would like to sample. 
The required samples are drawn from a realisation of this Markov chain 
obtained by computer simulation. To avoid excessive bias, the samples must 
come from a time step of the realisation that is beyond the mixing time of 
the Markov chain, i.e., the time r at which Xr is close enough to stationarity. 

The analysis of MCMC algorithms clearly requires us to bound the mix- 
ing time from above, and several approaches have been proposed for achiev- 
ing this goal. However, the computer simulation of the Markov chain will in 
general be imperfect. The transition probabilities may not be exactly what 
they should be. Even worse, the state space may be uncountably infinite, 
so we cannot even represent the states exactly in the computer. Does this 
matter? Obviously, the answer depends on the accuracy with which the 
Markov chain is simulated. The aim of this note is to quantify the required 
accuracy. 

As a paradigmatic example, consider the (lazy) ball walk in a convex 
body, due to Lovasz and Simonovits [7]. The state space in this instance is 
a convex body in M", i.e., a compact convex set K C M" of full dimension. 
The transition kernel of the ball walk (with step-size r > 0) is defined by 
the following trial: Suppose Xt = x. Choose a point y uniformly at random 
(u.a.r.) from the ball of radius r centred at x. If y £ K then Xt-^-i is y, 
otherwise Xt+i is x. The state space is continuous, and the transition kernel 
also. In any implementation it would be necessary to approximate the states 
in the realisation of the ball walk by vectors of finite-precision real numbers; 
likewise, the transition kernel would need to approximated by some discrete 
distribution. 

This example motivates our general setting. There is an "ideal" ergodic 
Markov chain (12, P), with state space f2 and transition kernel P, whose 
stationary distribution and mixing time is known. Then there is a perturbed 
Markov chain {Q,P), which is the one actually implemented. We assume 
C i7. Usually, f2 will be finite, though we don't assume this. Sometimes, 
as we have seen, 17 will be uncountably infinite. We no not assume that 
(i7, P) is necessarily ergodic. For example, in an implementation of the 
ball walk, the low order bits in the finite real number approximations might 
depend deterministically on those of the start state. In order to compare 
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the i-step distributions of the two Markov chains, we regard P*(x,-) as a 
probabiUty distribution on i7, using the convention A) := P*(x, Anf2). 

Observe that in general P^{x,-) does not converge to P*(x,-) in usual 
total variation distance (half £i-norm). Indeed, any finite approximation 
to the the ball walk will necessarily remain at total variation distance 1 
throughout, since f2 has measure zero in f2. It is clear, then, that any 
discussion of finite approximations to the ball walk must necessarily involve 
some underlying metric d on il. In the case of the ball walk it would be 
natural to take d to be the Euclidean metric. 

So regard {f2, d) as a metric space, and look at convergence in Prohorov 
metric: for Borel probability measures vr and vr' on i7, define 

g(7r,7r') := inf {e : 7:{A) < 7r'(^^) + e for all closed A}, (1) 

where A^ := {y : d{y,A) < e} and d{y,A) := mi{d{y,x) : x G A}. (It can 
be shown that taking an infimum just over closed sets is equivalent to taking 
an infimum over all Borel sets.) The appearance of the Prohorov metric in 
this context is not novel, as it has been used by a few people, for example 
Diamond et al. in studying approximations to dynamical systems. For 
reasons that will be mentioned in passing at the relevant moment, we need 
the technical condition that (i7, d) is a separable metric space. This will 
always be the case in practice (e.g., for Euclidean space (M"',£2))- 

Upon reflection, there seem to be three prerequisites for (il, P) to behave 
as a close approximation to {f2,P). 

1. P(x, •) should be close to P{x,-) for all x G i?. This is the most 
obviously necessary condition. The ball-walk example suggests that 
"close" should be measured in the Prohorov metric, and not total 
variation. 

2. P(x, •) should vary smoothly with x. This condition is necessary to ex- 
clude "chaotic" systems whose stationary distribution is very sensitive 
to small changes in P. 

3. (i7, P) should be rapidly mixing. Otherwise (i7, P) and (i7, P) might 
diverge slowly over time, even if conditions (1) and (2) are met. Con- 
sider, e.g., a random walk on {0, 1, . . . , 2" — 1} with a drift of order 
2-n_ 

Conditions (1) and (3) were noted by Azar et al. 0, whose motivation 
was similar to ours, but who considered the more restricted situation i? = i7. 
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They had no need of (2) since they were deahng only with Markov chains 
with discrete state spaces. 

Aside from Azar et ah, there is also related work on the simulation of 
dynamical systems, for example, by Shardlow and Stuart W. Here, the 
dynamical system may be in continuous time, and any computer simulation 
will involve discretisation of time as well as of the state space. (Indeed, it is 
fair to say that the discretisation of time is a greater concern in this setting.) 
Where this work diverges from that in the dynamical systems literature is 
in the emphasis on non-asymptotic bounds that explore the dependence of 
errors on some measure of the size or complexity of the Markov chain. For 
example, in the simple random walk example from condition (3) above, we 
are interested in quantifying, in terms of the size of the state space of the 
random walk, how close the transition kernel P{x, •) must be to P{x, •) to 
achieve an adequate approximation. In the case of the ball walk, we may 
want to quantify the closeness of approximation in terms of the dimension n, 
step-size r, and the diameter of the convex body K. This concern seems 
less of an issue in the dynamical systems literature. 

Although Theorem El is billed as the main result, it must be admitted 
that its conclusion is unsurprising and its proof banal. Nevertheless, it 
may have some utility in justifying the use of theoretical mixing-time upper 
bounds in imperfect computer simulations, where real numbers are carried to 
bounded accuracy and random variables are sampled from not quite the right 
distributions. An example application is given in ^ The main theoretical 
contribution of this note is in ^ where it is shown, through a sequence of 
counterexamples, that the three possible behaviours described in Theorem|2l 
are real, and not artifacts of the proof. These examples will hopefully shed 
light on the main mechanisms at work in this setting. 



2 Definitions and preliminaries 

Observe that the two occurrences of e in definition have different func- 
tions: one limits variation in position, and the other variation in probability. 
In questions of asymptotic convergence it is fine to lump these together. In 
quantitative work, we want to separate them, since we need to establish 
greater control over the former than the latter. In light of this, define a 
parametric version of the Prohorov metric 



Qx{iT,7r') := inf {e : 7r(yl) < tt' {A^") + e ioi all closed A). 



(2) 
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A metric such as this is not entirely unknown in the hterature, see Rachev (HI 
eq. (3.2.22)]. 

There is an alternative definition, due to Strassen |1U1 Cor. to Thm 11], 
of the Prohorov metric in terms of an optimal coupling. The (parameterised) 
Ky Fan distance Kx{X,Y) between random variables (r.v's) X,Y on Q is 
defined as 

Kx{X,Y) :=inf {e : VT[d{X,Y) > \e\ < e}. 
Denote by C{X) the law (distribution) of r.v. X. 

Theorem 1. Suppose vr and tt' are probability distributions on Q. Then 
q\{tt,tt') is the infimum of Kx{X,Y) over all pairs {X,Y) of coupled Q- 
valued r.v's such that C{X) = vr and C{Y) = tt'. 

(The theorem in this form is from Garci'a-Palomares and Gine jH].) 

Remark 1. Strassen states Theorem^ for the case A = 1, but the proof 
clearly holds for arbitrary A > 0. (To avoid delving into the proof, one could 
simply scale the metric d.) The case X = is the well-known Optimal Cou- 
pling Theorem. It is in the proof of Theorem^that the technical assumption 
of separability is used. 

One last definition, and we'll be ready to formalise conditions (l)-(3). 
The total variation distance between two measures vr and tt' on f2 is 

Ik - tt'IItv := £'o(7r,vr') = inf {|7r(yl) - it'{A)\ : A closed, A C j?}. 

The the variation threshold time 2, §4.3] of the Markov chain {Q,P) is 
defined to be 

n :=min{t : \\P\x, ■) - P\x' , -^ty < e"\ for all x,x' G Q}. 

The choice of threshold e~^ is somewhat arbitrary. There are other, slightly 
different notions of ii mixing, but they are equivalent for our purposes. In 
algorithmic applications, one often estimates the probability 7r{A) of some 
event A in the stationary distribution by taking a suitably sized sample 
from the t-step distribution P*(rE, •). There are two sources of error in this 
process: the sampling error, and the error occasioned by using P*(x, •) in 
place of vr(-). The variation threshold time is important precisely because it 
is a worst-case bound on the latter. 
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3 Main result 

Now all the definition are in place we can state the main result. 
Theorem 2. Suppose for some X,C,6 > 0: 

1. QxiPix, ■),P{x, •)) < 6, for all x G f2; 

2. QxiPix,-),P{x',-)) < Cd{x,x'), for allx,x' £ Q; 

3. The Markov chain defined by P is ergodic, with stationary distribu- 
tion TT, and variation threshold time ti. 

Then q\{P^{x, ■),tt) < e provided t > t^ := [ln(2e/e)ri] and, additionally: 

• in the case AC < 1, 



in the case AC = 1, 



• and in the case AC > 1, 



2t, 



6 < 



5 < 



te{te + l) 

{XC-lfe 



2(AC) 



Remarks 2. • The key point is that if AC < 1 then P does not need 
to approximate P to excessive accuracy, but only to within 0{t^^). 
In contrast, when AC > 1, the required accuracy scales exponentially 
with Ti. So, for example, real arithmetic would have to be carried 
out to a number of significant digits scaling linearly with ri. In the 
boundary situation, AC = 1, the required accuracy scales as 0{t^'^). 

• All three behaviours described in Theorem actually occur, and are 
not artifacts of the proof. Examples will be provided in ^ 

• In ^we shall see that the ball-walk, at least of the lazy kind, fits the 
most favourable case, AC < 1. 

• We can recover something akin to one of Azar et al. 's results by 
setting A = 0, C = 1 and d to be the discrete metric. Observe that 
condition 2 of the theorem becomes vacuous, and qq is just total varia- 
tion distance. Note that Azar et al. express their condition 3 in terms 
of ^2 mixing time (spectral gap). 
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Proof of Theorem\E Set t = = [ln(2e/e)ri^. Let (Xi) and (Xi) be 
Markov chains with transition kernels P and P, respectively, starting at 
a fixed state Xq = Xq = a £ f2. Note that t has been chosen so that 
Qx{C{Xt),TT) < \\C{Xt) - ttIItv < e/2. (See, e.g., Aldous and Fill H §4, 
Lemma 5].) 

We'll couple (Xi) and (Xi) so that 



QxiCiXt),CiXt))<e/2. 



(3) 



This will be possible provided S satisfies the appropriate condition laid down 
in the statement of Theorem|2 To see this, let Di := d{Xi,Xi) denote the 
divergence of the two Markov chains at time i. Consider the situation at 
time i — Suppose we have constructed a realisation of the coupled process 

(a, a) = (Xo,Xo),(Xi,Xi)...(Xi_i,Xi_i) = (6,6). 

Conditioned on (Xi-i, Xi-i) = {b,b) we have 

gx{C{Xi), C{Xi)) < Qx{C{Xi), P{b, •)) + gx{P{b, ■),C{Xi)) 
= gx{P{b,-),P{b,-)) + Qx{P{b,-),P{b,-)) 

where the final inequality uses conditions (1) and (2) of the theorem. Ac- 
cording to Theorem ^ we may couple Xi and Xi so that 

Pr [Di > \{CDi^i + 5)] < CA-i + S. 

Iterating this construction, it follows, by induction on i, that 



Pr 



Dt > X6j2i^cy 



i=Q 



t-1 



<5^(t-.)(ACr; 



i=0 



Considering first the case AC < 1 , we may sum the series in 



Pr 

which entails 



Dt > 



X5{1 - (AC)*) 
1 - AC 



< 



6t 



1 - AC 



XSCjl - (AC)* 
(1 - AC)2 



Pr 



Dt > 



X5 



1 - AC 



< 



5t 



1 - AC 



Our goal is to attain 



Pr[A > Ae/2] < e/2. 



(4) 

to obtain 

(5) 

(6) 
(7) 
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since this implies inequality © through Theorem ^ The analysis of the 
case AC < 1 is completed by noting that to achieve the goal it is sufficient 
that 6 <{1- XC)e/2t. 

Now turn to the case AC = 1. Summing the series in @ in this case 
yields 

P.[B, > Mt] < 

We achieve (0 provided 5 < e/t{t + \). 

The final case, AC > 1 is handled in a very similar manner to the first. 
In this case we find 



Pr 



A5((AC)* - 1) 



^ X6C{{XCY - 1) 
- (AC -1)2 ' 



AC - 1 

and that (jS)) is achieved provided 

(AC-1)2£ 
- 2(AC)*+i ■ 

In conclusion, we have shown that 

Qx{C{Xt),7T) < Qx{C{Xt),C{Xt)) + gx{C{Xt),7T) < e/2 + e/2 = e, 

as required. If t > t^, we simply delay starting the coupling until steps 
from the end. □ 



4 Counterexamples 

We demonstrate in this section that the dependence on ri indicated by 
Theorem|2lis correct: i.e., linear in the case AC < 1, exponential in the case 
AC > 1, and quadratic at the boundary. 

In applications we are thinking mainly of uncountable state spaces. How- 
ever, for convenience, the counterexamples will all be finite Markov chains. 

4.1 "Convergent" case 

The heading is intended to indicate the case AC < 0. We'll set A = (i.e., our 
measure of convergence is total variation distance) and C = 1, though the 
construction would work equally well for a range of A, C satisfying AC < 1. 

The state space in this counterexample is := {uj : < j < n}. 
Identify the state ujj with the point (ncos(2j7r/n),nsin(2j7r/n)) in M?, so 
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that the states are equally spaced points around a circle of radius n. The 
metric d is just Euclidean distance. 

Define transition probabilities for the Markov chain (from state ujj) ac- 
cording to the following trial: 

• With probability 1/n, set j' := 0. 

• Otherwise (with probability 1 — 1/n), set j' := (j + 1) mod n. 

The new state is ujj'. Informally, we move relentlessly clockwise around the 
circle, except that with probability 1/n we perform a "reset" and return to 
distinguished vertex ojq. Since C = 1 and the Euclidean distance between 
any pair of states is at least 1, condition (2) of Theorem |2l is vacuously true. 

It is easy to verify, by coupling, that the variation threshold time ri is 
0{n). Simply take two copies of the Markov chain and couple the resets. A 
synchronised reset occurs within n steps with probability at least 1 — e~^, 
so Ti < n. (See, e.g., Aldous |lj Lemma 3.6].) 

Define P as P but with reset probability 4/n in place of 1/n. We claim 
that with P there is significantly lower probability of observing j > n/2. 
Thus the stationary distributions are quite far apart in total variation dis- 
tance (which is Prohorov metric with parameter A = 0). 

The justification of this claim runs as follows. Assume for convenience 
that n is even, and fix a time step t > n. The probability that we observe 
j ^ n/2 (in the P version) is at least 

Pr(no reset in past n/2 steps A at least one reset in past n steps) 

This for large n is close to €'^^^'^{1 — e^^^"^) > 0-238. In contrast, for the P 
version, the probability that we observe j > n/2 is at most 

Pr(no reset in past n/2 steps), 

which for large n is close to < 0T36. Comparing with previous bound, 
it will be seen that the two stationary distributions differ by at least OT in 
total variation distance. 

So we certainly need to insist on5<4/n — l/n = 3/nifwe want to 
guarantee that the stationary distributions of the two Markov chains are 
closer than e = 0-1 in variation distance. In particular, we could not replace 
the Ti factor in the first case of Theorem [3 by anything growing more slowly. 
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4.2 "Neutral" case 

This is the boundary case AC = 1. The state space for this example is 
f2 := {ujij ■ < i,j < n}. Identify state Uij with the point (ncos(2_77r/n), 

nsin(2j7r/n), 5i/n^) in M^. (There are n circles, in n layers closely packed 
the z-dimension, each containing n evenly spaced states.) The metric d is 
again Euclidean distance. Define 



Define transitions probabilities (from state Uij) according to the follow- 
ing trial: 

1. • With probability r{i), set / := 0; 

• Otherwise (with probability 1 — r{i)), set f := j + 1 mod n. 

2. • With probability 2/3, set i' := max{i - 1, 0}; 

• Otherwise (with probability 1/3), set i' := mm{i + 1, n — 1}. 

The new state is LOi'ji. Informally: owing to the drift in the z-dimension, 
we quickly gravitate to i = layer and stay close to it. Within the layer, 
we move clockwise around the cycle, except that with probability r(i) we 
perform a "reset" and return to one of the distinguished states LOifi. 

We set A = 1, in other words we measure convergence in the standard 
Prohorov metric. It is routine to verify that q(^P{x, ■), P{x' ,-)) < d{x, x'), 
so that C = 1, and we arc in the AC = 1 (boundary) regime. (We need only 
check pairs of states of the form (x, x') = {LOij,u;iij), i.e., pairs which agree in 
their second index, since other pairs of states have separation d{x,x') > 1. 
Indeed, by the triangle inequality, we need only check pairs of the form 
{x,x') = (ojj.j, ojj+ij). There is a natural coupling of transitions from these 
adjacent states x and x' such that the new states are within distance 
of each other with probability at least 1 — 5/n^.) 

As before, we can show that ri = 0(n) using a coupling argument. 
Consider two copies of the Markov chain started in different states. In the 
first phase, couple on i using the identity coupling. Coupling (of the i-index) 
occurs at or before the first occasion at which both copies have visited i = 
layer. After this point the coupled versions always agree as to the level. 
This happens with high probability within 4n steps. In the second phase, 
we couple on j. We do the natural thing and synchronise the resets (just as 




'l/n 
< 5i/n^ 
4/n 



if i < n/5; 

if n/5 < i < 4n/5; 

if i > 4n/5. 
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in the case AC < 1). Again, we can arrange for a synchronised reset within 
2n steps with high probabihty. 

Define P as P but with drift on i reversed. The intuition is that we 
quickly gravitate to layer i = n — 1 and remain close to it. We then circle 
as before, but with much higher reset probability. We claim, as before, that 
with P there is a lower probability of observing j > n/2. Thus the stationary 
distributions are quite far apart in Prohorov metric. 

The justification of this claim runs as follows. Denote by £ the event (in 
the P Markov chain) 

"in the previous n steps, i has remained in range [0,n/5]" 

Fix a time step t > 6n. The probability that we observe j > n/2 (in the 
P version) is at least 

Pr(<S A no reset in past n/2 steps A at least one reset in past n steps) 

= Pr(£^) Pr(no reset in past n/2 steps Aat least one reset in past n steps | £). 

This for large n is close to e~^/^{l — e^^/^) > 0-238. In contrast, for the P 
version, denote by £' the event 

"in the previous n steps, i has remained in range [4n/5,n]", 

the probability that we observe j > n/2 is at most 

Pr(-i^r' V no reset in past n/2 steps) 

< Pr(-i<S') + Pr(no reset in past n/2 steps | £'). 

The latter probability for large n is close to e^^ < 0T36. Comparing with 
the previous estimate, we see the two stationary distributions differ by at 
least 0-1 in the Prohorov metric. 

So we certainly need to insist on 6 < 10n~^ to bring the stationary 
distributions of the two Markov chains within e = OT in the Prohorov 
metric. In particular, we could not, e.g., replace exponent 2 in the second 
case of Theorem 121 by anything smaller. 

4.3 "Divergent" case 

The state space here is i? := {2^"i : < i < 2"}, and the metric d : — > 
R+ is given by d{x, y) := |x — Define the function G : J7 — by 




2x, if a; < 1/2; 

2(1 - 2"" -x) ifx>l/2. 
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(This is the Lorenz "tent map" of dynamical systems |1H eq. (2.5.2)], 
adapted to the discrete situation.) Then the transition kernel 



1/2, ifyG{G(x),G(x)+2-"}; 
0, otherwise. 



defines an ergodic Markov chain with stationary distribution vr uniform on il. 
Why is this? View x = 0-xiX2 . . . x„ € J7 as an n-bit binary fraction. Then 



j 0-X2X3 . . . j;„0, ifx<l/2; 
G[x) = \ _ _ 

I 0-X2X3 . . . rE„0, if X > 1/2, 



where Xi := 1 — Xi. That is to say, G can be viewed as a left shift, followed 
(possibly) by complementation. (C.f. two's complement arithmetic.) So one 
step of the Markov chain can be viewed as a left shift, followed (possibly) by 
complementation, and concluded by appending a random bit. Thus P*(x) 
for any x £ and t > n is a binary fraction formed of independent, sym- 
metric Bernoulli r.v's. We see from this argument that ri = n. (Notice that 
distance from stationarity drops from 1/2 to between time t = n — 1 and 
time t = n!) Set A = 1, and observe that g{P{x, ■),P{x', •)) < 2d{x,x'), so 
that we are AC > 1 regime. (In light of the triangle inequality, we just need 
to check pairs (x, x') with x' = x + 2~".) 

Now define an approximating Markov chain: 



P{x,y) 



3/4, if y = G{x) and 2"-iG(x) is even; 

1/4, if y = G{x) and 2''-^G{x) is odd; 

3/4, if y = G{x) + 2-" and 2''-^G{x) is odd; 

1/4, if y = G{x) + 2-" and 2'"-^G{x) is even; 

0, otherwise. 



Note that ^>(P(x, ■),Pix, •)) < 2"". 

The interpretation of the Markov chain defined by P in terms of binary 
fractions is similar to before, only now the random bit appended is with 
probability 3/4 equal to the bit immediately to its left. So, for any t > n, 
P\x,A) = 3/4, where ^ = ([0, 1/4) U [3/4, 1)) . In contrast, P\x,A') = 
2/3, where A' = f2n ([0,1/3) U [2/3,1)). Now A' ^ A" withe = 1/12. Thus 

q{P\x, ■),7r) = q{P\x, ■),P\x, •)) > 1/12, 

where vr is the stationary (uniform) distribution. The bottom line is that 
the transition kernels P and P are very close and variation threshold time 
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is short, but that the stationary distributions of the two Markov chains are 
nevertheless far apart. The exponential dependence of 5 on ri in the third 
case of Theorem 1^ is unavoidable. 

5 Application: ball walk of Lovasz and Simonovits 

Recall the ball walk of Lovasz and Simonovits 7 in its "lazy" version. The 
situation is as follows. K C M"" is a convex body in n-dimensional Euclidean 
space. For x £ M" and r G M"'", Bn{x,r) denotes the n-dimensional (closed) 
ball centred at x. Procedurally, the lazy walk (Xt : t G N) is described by 
the following trial (where the current state is Xt = x £ M+): 

1. Choose y £ Bn{x,r) u.a.r. 

2. U y £ K then Xt+i := y else Xt+i := x. 
Alternatively, the transition kernel is 



where is Lebesgue measure, K denotes the complement of K, and Vn{r) := 
/in(-B„(0, r)) the volume of the n-dimensional ball of radius r. 
To apply Theorem [2 we want to find a constant C such that 



since d is here Euclidean distance. For this part of the calculation the value 
of A is immaterial (even A = will do), so we'll defer the choice of A until 
later. 

We could work directly from ® , but it seems easier to go via Theorem ^ 
Let d = \\x — x'\\2- For convenience, let x = duQ/2 and x' = —duQ/2, where 
uq is the unit vector parallel to the first coordinate axis. Define a coupling 
(y,y') with C{Y) = P{x, •) and C{Y') = P{x', •) according to the trial 

1. Choose y £ Bn{x,r) u.a.r. 

2. If y G Bn{x' , r) then y' := y else y' := y, where y is the reflection of y 
in the plane ^ • = 0. 




^ln{Bn{x,r)r\{KlJ A))/vn{r), \{x£A; 
fJ-n{Bnix,r) n K n A)/vn{r), otherwise 



(9) 



gx{P{x,-),Pix',-)) < Cdix,x') = C||x-x'||2, 



3. • y £ K then Y := y else Y := x; 

• liy' £K then Y' := y' else Y' := x' . 
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Note that Y = Y' unless y £ Bn{x, r)\i?„(x', r). Now ^n{Bn{x, r)\i?„(x', r)) 
is bounded above by the volume of a n-dimensional cylinder with height d 
and cross-sectional (n — l)-dimensional volume f„_i(r). Thus Pr(y 7^ Y') < 
dvn-i{r)/vn{r), and hence 

Vn{r) 

where C = @{^/n/r). (Note that the inequality holds for any A, even A = 0.) 
By setting A = 1/2C = Q{r/^/n) we place ourselves in the first (most 
favourable) case of Theorem [2^ 

Now, under the simplifying assumption that the convex body K does 
not have sharp corners, the variation threshold time is 

^ / D^n'^ln{D/r) 

^1 = ^ -2 

where D is the diameter of K. 

Remark 3. 5*66 16^ Thm 6.7 and Cor. 6.8] for more detail, including a 
precise explanation of the requirement of having no "sharp corners". Note 
that the radius of the hall defining the hall walk is usually denoted 6; we have 
used r instead to avoid a notational clash. For general convex hodies K, the 
mixing time is essentially as given in MUil . hut one has to take care over 
the distrihution of the start state of the walk, since the ball walk in its lazy 
variant may get trapped for long periods near points on the boundary of K 
of tight curvature. 

From the above considerations, it can be seen that the transition ker- 
nel P of the ball walk as implemented is not required to approximate the 
ideal transition kernel very closely; specifically we require, according to The- 
orem 2, Qx{P{x, ■), P{x, ■)) < 6, where 

^ = ^{D^nZ{D/r))- ^^^^ 

This is consistent with Lovasz and Simonovits's observation that for their 
algorithm real numbers need only be carried to O(logn) digits. 

Some concise notes on how to achieve (fTT|). Assume, as a starting point, 
procedures that sample points from distributions that are close to A^(0, 1) 

^In applications of the ball walk, the radius r is typically of order so that 

A = e(l/n). 
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(Gaussian with mean and variance 1) and to {7(0, 1) (uniform on [0, 1]). A 
standard approach to samphng a point u.a.r. from i?„(0,r) is the fohowing 
(with step 2 omitted): 

1. Let <Pi,(p2, ■ ■ ■ be i.i.d. samples from iV(0, 1). 

< \\fn declare the trial void and start 

again at step 1. 

3. Set 5 = r(^'i,^2,...,^n)/i?. 

4. Let [/ be a sample from the uniform distribution on [0, 1], and return 

We assume throughout that arithmetic is exact, in order to focus on sampling 
errors. Assume that ^i, • • • > and U are sampled perfectly from distri- 
butions A^(0, 1) and {7(0, 1). Then {<Pi,<p2, ■ ■ ■ , ^n) is distributed according 
to an n-dimensional symmetric Gaussian distribution, and is in particular 
rotationally symmetric. Thus, with or without step 2, S is distributed uni- 
formly over the surface of i?n(0, r). The finally step spreads the distribution 
uniformly into the interior of Bn{0,r). The unusual step 2 is included to 
avoid a small error being blown up in the unlikely event that R is close to 0. 

Without loss of generality, assume that ball walk is at the origin at time 
step 0. Its location Y at time step 1 is obtained by applying the rejection 
rule to the r.v. W; explicitly, Y = W W £ K, and Y = otherwise. Now 
suppose that we have only approximations and U to the perfect samples. 
Specifically, suppose 

g{C{$^),N{0,l)) =0{6/n) and q{C{U),U{0,1)) = 0{6^), (12) 

where 6, given by (fTT|l . is the deviation we are prepared to tolerate in C{Y), 
the approximate version of Y. (Specifically, we are aiming at gx{C{Y), C{Y)) < 
6.) Suppose that we run through the above trial, replacing the perfectly dis- 
tributed r.v's by their hatted, imperfect approximations S, W and finally Y, 
which arises from the rejection rule: Y = W iiW G K, and Y = otherwise. 

Now couple the hatted and unhatted r.v's as suggested by Theorem ^ 
The build-up of errors is summarised in the following table. The penultimate 
row relates to the approximate proposal move W for the ball walk, sampled 
according to the four-step trial described earlier, and the final row to the 
result of applying the rejection rule. The interpretation of ^ay) the third 
line of the table is that we may couple S and S so that ||5 — 5||2 = 0{r5/n) 
with probability 1 — 0{5). 
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Random variable X 


\\X — X\\2 bounded by. . . 


except with probability. . . 




0{6/n) 


0{5/n) 






0{6) 


s 


0{r5/n) 


0{6) + 0{6/n) = 0{6) 


w 


0{r5/n) 


0{5) + 0{5) = 0{5) 


Y 


0{r5/n) 


0{5) + 0{5) = 0{5) 



The rows of the table may be checked as fohows. Row 2 is straight- 
forward. In row 3 we need to be concerned about the trial being declared 
void in the hatted trial and not in the unhatted, or vice versa. For this to 
occur, R must be within 0{5/ ^/n) of ^-v/ra; ^''^ event whose probability may 
be (crudely) bounded by 0{5/^/n) x 0{l/^/n) = 0{6/n). (The density of 
the r.v. R is unimodal, and achieves its maximum at the point \/n — 1; so 
the density of R at ^e at most 0(1/ ^/n).) In row 4, we need to 

be concerned about errors being magnified when U is close to 0. We deal 
with this simply by giving everything away if U = 0{5). In the final row, 
our concern is with the event Y £ K and Y ^ K (or vice versa). For this 
event, we must have Y G {K'^ \ K) n i?(0,r), where, as usual, denotes 
the Minkowski sum of K and a ball of radius r/, and r/ = 0{r5/n). Now 

^ln{{K^ \ K) n i?(0, r)) < ^,n {B{0, r)" \ Bio, r)) , 

and so 

fin{{K^\K)nBiO,r)) f,4BiO,r)^\B{0,r)) ^ 

MB{0,r)) - l^n{B{0,r)) ^ 

Recall that we have set A = Q{r/y/n), from which it follows that r6/n = 
0{X6). In summary, then, to obtain a close approximation to the ball- 
walk it is enough that the various samples from the Gaussian and uniform 
distributions satisfy H12|) . where 6 is given by (|11() . 

Remark 4. It is unlikely that one would want, in the analysis of a new 
algorithm, to repeat a calculation such as the one given above in a similar 
level of detail. Nevertheless, it would be comforting to verify, in practical 
situations, that one was working in one of the two favourable cases in The- 
orem 0- it would then follow by more informal reasoning that logarithmic 
(number of bits or significant digits) accuracy would suffice. 
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